Using Siphon to query the NetCDF Subset Service

Unidata Python Workshop


Objectives

  1. Learn what Siphon is
  2. Employ Siphon's NCSS class to retrieve data from a THREDDS Data Server (TDS)
  3. Plot a map using numpy arrays, matplotlib, and cartopy!

Introduction:

Siphon is a python package that makes downloading data from Unidata data technologies a breeze! In our examples, we'll focus on interacting with the netCDF Subset Service (NCSS) as well as the radar server to retrieve grid data and radar data.

But first! Bookmark these resources for when you want to use Siphon later!

Let's get started!

First, we'll import the TDSCatalog class from Siphon and put the special 'matplotlib' line in so our map will show up later in the notebook. Let's construct an instance of TDSCatalog pointing to our dataset of interest. In this case, I've chosen the TDS' "Best" virtual dataset for the GFS global 0.25 degree collection of GRIB files. This will give us a good resolution for our map. This catalog contains a single dataset.


In [ ]:
%matplotlib inline
from siphon.catalog import TDSCatalog
best_gfs = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/'
                      'Global_0p25deg/catalog.xml?dataset=grib/NCEP/GFS/Global_0p25deg/Best')
best_gfs.datasets

We pull out this dataset and call subset() to set up requesting a subset of the data.


In [ ]:
best_ds = list(best_gfs.datasets.values())[0]
ncss = best_ds.subset()

We can then use the ncss object to create a new query object, which facilitates asking for data from the server.


In [ ]:
query = ncss.query()

We can look at the ncss.variables object to see what variables are available from the dataset:


In [ ]:
ncss.variables

We construct a query asking for data corresponding to a latitude and longitude box where 43 lat is the northern extent, 35 lat is the southern extent, 260 long is the western extent and 249 is the eastern extent. Note that longitude values are the longitude distance from the prime meridian. We request the data for the current time. This request will return all surface temperatures for points in our bounding box for a single time. Note the string representation of the query is a properly encoded query string.


In [ ]:
from datetime import datetime
query.lonlat_box(north=43, south=35, east=260, west=249).time(datetime.utcnow())
query.accept('netcdf4')
query.variables('Temperature_surface')

We now request data from the server using this query. The NCSS class handles parsing this NetCDF data (using the netCDF4 module). If we print out the variable names, we see our requested variables, as well as a few others (more metadata information)


In [ ]:
from xarray.backends import NetCDF4DataStore
import xarray as xr

data = ncss.get_data(query)
data = xr.open_dataset(NetCDF4DataStore(data))

list(data)

We'll pull out the temperature variable.


In [ ]:
temp_3d = data['Temperature_surface']

We'll pull out the useful variables for latitude, and longitude, and time (which is the time, in hours since the forecast run). Notice the variable names are labeled to show how many dimensions each variable is. This will come in to play soon when we prepare to plot. Try printing one of the variables to see some info on the data!


In [ ]:
# Helper function for finding proper time variable
def find_time_var(var, time_basename='time'):
    for coord_name in var.coords:
        if coord_name.startswith(time_basename):
            return var.coords[coord_name]
    raise ValueError('No time variable found for ' + var.name)

In [ ]:
time_1d = find_time_var(temp_3d)
lat_1d = data['lat']
lon_1d = data['lon']
time_1d

Now we make our data suitable for plotting. We'll import numpy so we can combine lat/longs (meshgrid) and remove one-dimensional entities from our arrays (squeeze). Also we'll use netCDF4's num2date to change the time since the model run to an actual date.


In [ ]:
import numpy as np
from netCDF4 import num2date
from metpy.units import units

# Reduce the dimensions of the data and get as an array with units
temp_2d = temp_3d.metpy.unit_array.squeeze()

# Combine latitude and longitudes 
lon_2d, lat_2d = np.meshgrid(lon_1d, lat_1d)

Now we can plot these up using matplotlib. We import cartopy and matplotlib classes, create our figure, add a map, then add the temperature data and grid points.


In [ ]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from metpy.plots import ctables

# Create a new figure
fig = plt.figure(figsize=(15, 12))

# Add the map and set the extent
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_extent([-100.03, -111.03, 35, 43])

# Retrieve the state boundaries using cFeature and add to plot
ax.add_feature(cfeature.STATES, edgecolor='gray')

# Contour temperature at each lat/long
contours = ax.contourf(lon_2d, lat_2d, temp_2d.to('degF'), 200, transform=ccrs.PlateCarree(),
                       cmap='RdBu_r')
#Plot a colorbar to show temperature and reduce the size of it
fig.colorbar(contours)

# Make a title with the time value
ax.set_title(f'Temperature forecast (\u00b0F) for {time_1d[0].values}Z', fontsize=20)

# Plot markers for each lat/long to show grid points for 0.25 deg GFS
ax.plot(lon_2d.flatten(), lat_2d.flatten(), linestyle='none', marker='o',
        color='black', markersize=2, alpha=0.3, transform=ccrs.PlateCarree());

Exercise

Create your own map using the same projection as above but plot different data variables such as dewpoint or relative humidity.

  1. Explore the variables available in the NCSS dataset by printing NCSS.variables
  2. Change the latitude/longitude values for the request and the map to a region of your own interest!
  3. If you're feeling bold, pass in a different TDSCatalog reference url (such as the GFS half degree). Take a look at the full TDS catalog here.